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Objective. To determine whether gene expression 
profiles identified in peripheral whole blood samples 
could be used to determine therapeutic outcome in a 
cohort of children with newly diagnosed polyarticular 
juvenile idiopathic arthritis (JIA). 

Methods. Whole blood samples from the Trial of 
Early Aggressive Therapy (TREAT) in JIA patients were 
analyzed on Illumina microarrays, and differential gene 
expression was compared to expression in healthy con- 
trols. Microarray results were validated by real-time 
quantitative polymerase chain reaction in an indepen- 
dent cohort of samples. Pathway analysis software was 
used to characterize gene expression profiles. Support 
vector machines were used to develop predictive models 
for different patient classes. 

Results. Differential gene expression profiles for 
rheumatoid factor (RF)-positive and RF-negative pa- 
tients were remarkably similar. Pathway analysis re- 
vealed a broad range of affected pathways, consistent 
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with current mechanistic theories. Modeling showed 
that the prognosis at 6 months was strongly linked to 
gene expression at presentation, irrespective of treat- 
ment. 

Conclusion. Gene expression is linked to thera- 
peutic outcome, and gene expression in the peripheral 
blood may be a suitable target for a prognostic test. 

The completion of sequencing of the human 
genome was lauded as the necessary first step toward 
developing specific, patient-tailored treatments for 
many complex diseases (1). The development of "per- 
sonalized medicine" is considered highly desirable be- 
cause, for many of the most vexing diseases in industri- 
alized societies, there is a broad spectrum of individual 
therapeutic responses to any given empirically derived 
treatment approach. We know, for example, that some 
patients with rheumatoid arthritis (RA) will have an 
excellent and sustained response to methotrexate 
(MTX), while others will fail to have satisfactory func- 
tional outcomes until biologic agents, usually anti-tumor 
necrosis factor (anti-TNF) therapies, are initiated (2,3). 
It would be highly desirable to know which patients are 
going to need more-aggressive therapies from the outset 
so that we can minimize the human and economic toll 
that diseases such as RA carry with them. 

To date, numerous attempts have been made to 
develop predictive biomarkers of therapeutic response 
in human illnesses, that is, to develop strategies for 
implementing the "personalized medicine," which has 
been a 10-year goal of physicians and scientists. Among 
the most promising tools that have been used toward this 
goal is gene expression profiling, the survey of genes 
expressed or suppressed in a particular cell type, tissue, 
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or clinical sample. While there has been some success in 
developing specific chemotherapeutic strategies for can- 
cer using this approach (4,5), similar attempts for rheu- 
matic diseases (typically using mixed cells from the 
peripheral blood) have yielded disappointing results, 
largely because the initial findings were not corrobo- 
rated in independent cohorts. 

Until now, no attempt has been made to develop 
therapeutic biomarkers for the childhood forms of ar- 
thritis using gene expression profiling. The importance 
of doing so is illustrated by the fact that these diseases 
are among the most common chronic ones in children 
(6-8) and continue to result in serious functional limi- 
tations. Like adult RA, which it resembles phenotypi- 
cally, the polyarticular form of juvenile idiopathic arthri- 
tis (JIA) displays considerable heterogeneity in terms of 
response to standard therapies (9-11). Thus, in the field 
of pediatrics, finding biomarkers that can predict thera- 
peutic response at presentation or early in therapy is 
expected to have an important effect on our ability to 
treat the disease and restore/preserve function and 
normal childhood activities. 

The Trial of Early Aggressive Therapy (TREAT) 
in JIA patients is a recently completed, NIH-funded 
clinical trial (12) comparing 2 therapeutic regimens for 
the treatment of newly diagnosed polyarticular JIA: one 
arm used subcutaneous (SC) MTX as initial therapy, 
and the other arm used a combined regimen of subcu- 
taneous MTX, a TNF inhibitor (etanercept), and oral 
prednisolone (tapered to 0 by 17 weeks). As part of the 
TREAT in JIA trial, whole blood was collected from 
consenting participants for RNA expression studies at 
specific time points during the course of the first year of 
therapy. We report here the results of the expression 
profile analysis using whole-genome microarrays, as 
confirmed by the study of an independent cohort derived 
from the Children's Rheumatology Clinic at the Univer- 
sity of Oklahoma. 

PATIENTS AND METHODS 

Samples from patients in the TREAT in JIA study. 

Eighty-five patients were recruited into the TREAT in JIA 
trial between October 2007 and November 2009 (12). All 
children met the international criteria for polyarticular-onset 
JIA (13). Sixty-two parents of these children gave written, 
informed consent for providing these samples for translational 
uses, and children 7 years of age or older gave assent to 
participate in the study. Approval for use of the specimens was 
given by the TREAT in JIA study oversight committee. The 
patients submitting samples for this current study consisted of 
19 boys and 43 girls, ages 2-14 years. Four of the boys and 17 
of the girls were rheumatoid factor (RF) positive. At the time 



of enrollment (month 0) and prior to treatment, 2.5 ml of 
blood was collected into a PAXgene tube (PreAnalytiX). 
Samples were stored at — 80°C. (A summary of patient char- 
acteristics is available upon request from the corresponding 
author.) 

Patients were randomly assigned to 1 of 2 blinded, 
aggressive treatment arms of the study. Arm 1 consisted of 
treatment with MTX 0.5 mg/kg/week SC plus etanercept 0.8 
mg/kg/week SC (maximum dosage 50 mg/week) in combina- 
tion with oral prednisolone (0.5 mg/kg/day; maximum dosage 
60 mg/day) for 16 weeks. Arm 2 consisted of MTX 0.5 
mg/kg/week SC (40 mg maximum) plus placebo etanercept SC 
weekly and placebo oral prednisolone tapered to 0 by 17 
weeks. At 4 months, patients who did not achieve American 
College of Rheumatology (ACR) Pediatric 70 (Pedi 70) im- 
provement from baseline were treated (or retreated) with 
open-label MTX, etanercept, and prednisolone. At 6 months, 
patients who did not achieve clinically inactive disease were 
changed to treatment with open-label MTX, etanercept, and 
prednisolone, if they were not already receiving this treatment. 

Further specimens were collected during visits at 4 
months, 6 months, and 12 months after enrollment (month 0). 
For purposes of the TREAT in JIA study, inactive disease was 
defined as no evidence of synovitis, absence of fever, rash, 
lymphadenopathy, and splenomegaly, no active uveitis, normal 
erythrocyte sedimentation rate or C-reactive protein level, and 
a physician's global assessment score indicating no active 
disease. 

Samples from healthy control subjects. Controls con- 
sisted of 8 healthy female and 11 healthy male children 
between the ages of 7 and 13 years who were recruited from 
the University of Oklahoma Children's Physicians General 
Pediatrics Clinic. The protocol for obtaining these specimens 
was approved by the University of Oklahoma Institutional 
Review Board (no. 13205). Anesthesia for the phlebotomy was 
provided using topical lidocaine/prilocaine solution. 

RNA processing. RNA was purified from whole blood 
PAXgene specimens using a PAXgene blood RNA kit (Qia- 
gen) as recommended by the manufacturer, including a DNase 
(Qiagen) step to remove genomic DNA. Globin transcripts, 
which reduce labeling efficiency of whole blood cell RNA and 
decrease signal-to-noise ratios on microarrays (14), were re- 
duced using GlobinClear (human; Ambion). Final RNA prep- 
arations were suspended in RNase-free water, quantified 
spectrophotometrically, and analyzed for RNA integrity by 
capillary gel electrophoresis (Agilent 2100 Bioanalyzer). 

Due to technical issues such as RNA degradation, not 
all 62 samples from the TREAT in JIA study were available for 
microarray analysis at both month 0 and month 4. Figure 1 
shows a schematic representation of the study, including which 
samples were analyzed at each time point. A total of 44 
samples were available for microarray analysis at month 0 and 
49 samples at month 4. 

Microarray analysis. Complementary RNA was pro- 
duced from reverse-transcribed complementary DNA using an 
Illumina TotalPrep RNA amplification kit (Ambion), hybrid- 
ized to Illumina WG-6 v3 or Illumina HT-12 v4 human whole 
genome microarrays, and stained according to the manufac- 
turer's directions. Microarray hybridizations were undertaken 
in 2 separate batches. The first batch consisted of samples from 
the 19 healthy controls as well as samples from the JIA 
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Figure 1. Schematic representation of treatment regimens at month 0 (baseline) and month 4, with clinical outcomes at month 6 and month 12, 
in patients with juvenile idiopathic arthritis (JIA), according to the presence or absence of rheumatoid factor (RF). Each column of symbols 
represents a single patient at different stages of the Trial of Early Aggressive Therapy (TREAT) in JIA study. Study arm 1 consisted of treatment 
with methotrexate (MTX) plus etanercept (ET), as well as prednisolone. Study arm 2 consisted of MTX only. At months 6 and 12, patients were 
assessed for the presence of clinically inactive disease (CID) or active disease (AD). 



patients: 26 obtained at month 0, 2 at month 4, and 1 at month 
12. These samples were hybridized on Illumina WG-6 v3 
arrays. The second batch consisted of the remaining 18 JIA 
patient samples from month 0 and 47 JIA patient samples from 
month 4. These samples were hybridized on Illumina HT-12 v4 
arrays. Complementary RNA preparation and hybridizations 
of the second batch were carried out 12 months after the first 
batch. 

Validation of differential gene expression by real-time 
quantitative reverse transcription-polymerase chain reaction 
(qRT-PCR) in an independent patient cohort. An additional 
cohort of samples was collected for qRT-PCR analysis in order 
to provide independent validation of the results of gene 
expression analyses carried out with the TREAT in JIA 
samples. These whole blood PAXgene specimens were ob- 
tained from an independent cohort of 8 children with un- 
treated, RF— polyarticular JIA recruited from the University 
of Oklahoma Health Sciences Center Pediatric Rheumatology 
Clinic. These children ranged in age from 2 to 11 years and 
consisted of 2 boys and 6 girls. These samples were collected at 
month 0, prior to treatment. Nine genes that showed signifi- 
cant differential expression in the microarray results and that 
are known to be associated with rheumatoid disease were 
analyzed by qRT-PCR. 

Total RNA (0.9 /j,g) was reverse transcribed with the 
use of an iScript cDNA synthesis kit (Bio-Rad) according to 
the manufacturer's instructions. Relative levels of target gene 
transcripts were assayed in triplicate using real-time qRT-PCR 
with SYBR Green reagents and a StepOne Plus PCR system 
(Applied Biosystems). The temperature profile consisted of an 
initial step at 95°C for 10 minutes, followed by 40 cycles of 95°C 



for 15 seconds, 60°C for 1 minute, and then a final melting 
curve analysis with a ramp from 60°C to 95°C over 20 minutes. 
Gene-specific amplification was confirmed by a single peak in 
the ABI Dissociation Curve software. The relative abundance 
of transcript expression data was normalized to GAPDH 
expression. Results are presented as the ratio of the concen- 
tration of messenger RNA (mRNA) relative to GAPDH 
mRNA (2 _AC i). Statistical analysis was performed on the AC, 
value using unpaired Mests. Primers were synthesized by 
Integrated DNA Technologies. 

The nucleotide sequences of the primers were as 
follows: for CD44, 5'-CATCCAACACCTCCCAGTATG-3' 
(sense) and 5 ' -CTGCTCACGTCATCATCAGTAG-3 ' (anti- 
sense); for exocyst complex component 4 (Exo-4), 5'-TTGA- 
TGTTACAAACCTCCCTACTC-3 ' (sense) and 5'-CCAAG- 
CCCTTAATGAGGATACC-3 ' (antisense); for macrophage 
migration inhibitory factor (MIF), 5'-GTCCCGGACCAGCT- 
CAT-3' (sense) and 5 ' -GCCGCGTTCATGTCGTAATA-3 ' 
(antisense); for NF-kB1, 5'-CTGCTGTGCAGGATGAGAA- 
T-3' (sense) and 5'-AAATCCTCCACCACATCTTCC-3' 
(antisense); for peptidylarginine deiminase 4 (PADI-4), 5'-C- 
CAGGTCTGAGATGGACAAAG-3 ' (sense) and 5'-AGGG- 
AGATGGTGAGGGTAAT-3 ' (antisense); for poly(ADP- 
ribose) polymerase (PARP-1), 5'-GTAGCAACAAACT- 
GGAACAGATG-3' (sense) and 5'-GGACTTGGTGCCAG- 
GATTTA-3' (antisense); for protein tyrosine phosphatase 
receptor type C (PTPRC), 5'-CGTAATGGAAGTGCTGCA- 
ATG-3' (sense) and 5 ' -TGCGACTCATTTCTAACCAGA- 
G-3' (antisense); for serpin A, 5'-AATGCCACCGCCATCT- 
T-3' (sense) and 5 ' -CCCATTGCTGAAGACCTTAGT-3 ' 
(antisense); and for STAT-6, 5 ' -CAAGTTTAAGACAGGCT- 
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Figure 2. Scatterplots of data from principal components analyses (PCAs), showing the distribution of the healthy control (HC) samples and the 
Trial of Early Aggressive Therapy (TREAT) in juvenile idiopathic arthritis (JIA) patient samples, whether positive or negative for rheumatoid factor 
(RF), through the first 2 principal components. A, After normalization of raw data, the healthy controls and patient samples show a good degree 
of separation, but a strong batch effect is apparent between the 2 array batches in the patient samples. B, After the ComBat algorithm was applied 
to the data, the batch effect was removed. 



TGCG-3' (sense) and 5 ' -TCTTCAGCACTAGGGCTTTG-3 ' 
(antisense). All primers were tested to display an efficiency of 
-95% (±SD 2%). 

Support vector machine (SVM). Models were con- 
structed to predict clinically inactive disease (CID) or active 
disease (AD) at 6 and 12 months from gene expression at 
presentation (month 0) or gene expression at 4 months (month 
4). Models were built using data for all patients and separately 
for RF- patients only. Specifically, for each model, the 
relevant samples were randomly divided into a training group 
(two-thirds of available samples) and a test group (one -third of 
available samples), using a randomized block method to 
ensure even division between the groups of CID and AD 
samples. The test groups were then reserved for testing 
models, which were trained using only data from the training 
groups. Linear models were applied to each training group as 
described above and used to identify genes that were differen- 
tially expressed between the AD and CID groups in the 
training groups. Subsets of these differentially expressed gene 
lists were used to train the models to predict AD or CID in 
patients based on differential expression between patients and 
controls. The el07 package (15) in R was used as an interface 
for the LIBSVM library (16). Optimization of SVM para- 
meters and the gene subset was done using a 10-fold cross- 
validation method on the training group. The subsets of each 
gene list to be used in each model were determined by starting 
with the 10 most significantly differentially expressed genes (by 
adjusted P value) in the relevant training group and sequen- 
tially adding genes up to a total of 200 genes. The performance 
of each model was assessed using receiver operator character- 
istic curves. Optimized SVM parameters and gene lists were 
then used to build models. The resulting models were then 
used to predict outcome (CID or AD) for the test groups, and 
the predictions were analyzed for accuracy and rates of false- 
positive results. 



Statistical analysis. All statistical analyses were carried 
out in R (www.r-project.org). To facilitate statistical analyses 
relative to healthy controls, it was necessary to combine data 
from different hybridization batches. Due to the difference in 
the microarrays, it was necessary to create combined datasets 
using only the probes that were present on both array formats. 
Illumina probe identification numbers were used to identify 
39,426 common probes. Datasets were variance-stabilized and 
normalized using robust spline normalization via the Lumi 
software package (17,18). Batch effects were removed using 
the ComBat algorithm in the SVA software package (19). Prior 
to statistical analysis, nonresponding probes were filtered out 
of the datasets using the detection P value provided by the 
Illumina quality control metrics to eliminate probes not re- 
sponding at higher than background levels. Analysis of differ- 
ential gene expression patterns between patients and controls 
was performed using the Limma software package (20,21). The 
false discovery rate (FDR) was estimated using the method 
described by Benjamini and Hochberg (22). Statistical signifi- 
cance of gene expression was determined at an FDR of <0.05. 
Gene lists of interest were exported from R and uploaded to 
Ingenuity IPA software for further functional analysis. 

RESULTS 

Segregation of JIA patient samples from healthy 
control samples. Principle components analysis (PCA) 
of normalized signal data for month 0 in JIA patients 
and healthy controls showed that the gene expression 
profiles clearly separated the patients from the controls. 
However, although RF— and RF+ samples show some 
separation on PCA, they were not perfectly segregated 
(Figure 2A). A strong batch effect between the first and 
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Figure 3. Heatmap showing gene expression levels at month 0 in juvenile idiopathic arthritis patients positive or negative for rheumatoid factor 
(RF) and in healthy controls, using 250 probes for significantly differentially expressed genes. Data shown are the log ratio for differential expression 
relative to the mean of the healthy controls (false discovery rate <0.05; absolute fold change >1.4). The dendrogram shown is a hierarchical 
clustering of patient samples using Euclidean distance on the 250 probes. 



second batches of the TREAT in JIA samples is also 
noted in Figure 2A. PCA of data after application of the 
ComBat algorithm shows that this procedure success- 
fully removed the batch effect (Figure 2B). 

Differential gene expression analysis. A heatmap 
of genes selected at an FDR of 0.05 and a minimum fold 
change of 1.4 for differential expression in either the 
RF— or RF+ groups for each patient sample at month 
0 and for the healthy control subjects is shown in Figure 
3. Differential gene expression was relatively homoge- 
nous across the patient samples, and there was no 
striking difference in differential gene expression be- 
tween RF— and RF+ patients. The maximum fold 
change was 10-fold in either direction, but with the 
majority of genes differentially expressing no more than 
3-fold in either direction. (A full list of genes differen- 
tially expressed between all month 0 samples and the 
healthy controls at an FDR of 0.05 irrespective of RF 
status or fold change is available upon request from the 
corresponding author). 

For further functional analysis, gene lists were 
selected at an FDR of 0.05 and a minimum fold change 
of 1.4. While large numbers of genes were significant at 
a low FDR, absolute fold change levels were low, and 
filtering by fold change drastically reduced the numbers 
of gene that were declared significant. At month 0, 125 
genes were differentially expressed in RF— JIA samples, 
while 237 genes were differentially expressed in the 



RF+ JIA samples. At month 4, 123 genes were differ- 
entially expressed in the RF— samples, while 110 genes 
were differentially expressed in the RF+ samples. 

Of the genes differentially expressed at month 0 
relative to the healthy controls, 90 genes represented by 
98 probe sets were significantly differentially expressed 
in both RF+ and RF— patients. 

Given the complex composition of whole blood 
specimens, it is difficult to make testable inferences 
about disease pathogenesis from the expression profiling 
patterns. When functional associations of the genes 
differentially expressed between JIA patients and 
healthy controls were explored using the Ingenuity soft- 
ware package, predictable numbers of genes associated 
with immunologic disease (n = 18), inflammatory dis- 
ease (n = 16), and connective tissue disorders (n = 15) 
were identified, including 12 genes associated with rheu- 
matoid arthritis (CD3D, CD97, CYP4F3, FOX03, 
GNLY, GRN, HSPA1A/HSPA1B, MMP9, PADI4, 
SORL1, UBE2H, and WNK1). It was interesting to note 
the number of genes associated with cancer, in light of 
the emerging data regarding inflammatory signatures in 
cancer (23,24) and the cancer-like behavior of rheuma- 
toid synovial cells (25). Consistent with recent reports of 
an "inflammatory signature" in the gene expression 
profile of many tumors (23,24), 21 differentially ex- 
pressed genes were related to cancer, while others fell 
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Figure 4. Validation of the microarray results by real-time quantitative reverse transcription-polymerase chain reaction analysis of 9 genes in an 
independent cohort of 8 children with untreated, rheumatoid factor (RF)-negative polyarticular juvenile idiopathic arthritis (JIA) and in 8 healthy 
controls from the initial cohort. Statistical analysis was performed on AC, values using unpaired (-tests. Values are the mean ± SEM. * = P < 0.05; 
** = P < 0.01 versus healthy controls. 



into categories of connective tissue disorders, immuno- 
logic disease, and inflammatory disease. 

Validation of the gene expression results. To 
confirm the differences in gene expression between 
patients and healthy controls observed in the microarray 
experiments, real-time qRT-PCR was performed. Nine 
genes known to be associated with rheumatoid disease 
and shown to be significantly differentially expressed 
(FDR <0.05) between the RF- patients and the healthy 
controls in the microarray analysis (CD44, EXOC4, MIF, 
NFKB1, PADI4, PARP1, PTPRC, SERPINA, and 
STAT6) were analyzed by real-time qRT-PCR in an 
independent cohort. Figure 4 shows that 7 of 9 genes 
differentially expressed in the microarray analysis were 
also differentially expressed in the real-time qRT-PCR. 
For MIF the real-time qRT-PCR results showed a 
relative increase in gene expression in the independent 
cohort, while the microarray results showed a decrease 
in the expression of MIF in the TREAT in JIA cohort, 
though it was differentially expressed on qRT-PCR. 

Prediction of disease status at 6 and 12 months. 
A schematic representation of the treatment regimens 
and outcomes in the JIA patients is shown in Figure 1. 
Consistent with the overall findings in the TREAT in 
JIA study (12), the findings in the subset of samples 
analyzed herein strongly suggested that for the RF- 
patients, there was a relationship between early use of 
combined treatment and the attainment of a positive 
outcome at 12 months. However, this pattern was not 
apparent in the RF+ patients. 

Eight models were built with support vector 
machines to predict disease status (CID or AD) at 



month 6 or month 12 using the gene expression data in 
patients at month 0 or month 4 and using either all of the 
samples or only the RF— samples. A total of 28 RF 
and 16 RF+ arrays were available at month 0, and 32 
RF- and 17 RF+ arrays at month 4. For each model, a 
different number of genes was found to give the opti- 
mum predictive power; this ranged from 12 for the 
model using month 4 data in RF- patients to predict the 
month 12 outcome, to 120 for the model using month 0 
data in all patients to predict the month 12 outcome. 
(Details of the genes selected for each model are 
available upon request from the corresponding author.) 

Figure 5 shows the receiver operating character- 
istic curves, where CID is considered the positive out- 
come, for the 4 models built with month 0 data, and 
Table 1 gives the areas under the curve for all 8 models. 
The model using month 0 gene expression to predict 
CID at 6 months for RF— patients was able to perfectly 
classify the 9 samples in the test group (Table 1). The 
equivalent model for all patients (RF+/RF-) correctly 
classified 11 of the 14 samples tested, with a false- 
positive rate of 0.12. The 2 models using month 0 data to 
predict CID at month 12 were able to achieve accuracies 
of —70%; however, the RF— model had a false-positive 
rate of 0.33, while the RF+/RF- model had a false- 
positive rate of 0.80. Models using month 4 data were 
not able to significantly improve on chance, with accu- 
racies between 40% and 60%. 

Table 1 also shows the accuracies and false- 
positive rates achieved when treatment with the com- 
bined MTX plus etanercept regimen was used as the 
predictive feature. Gene expression was the better pre- 
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Figure S. Receiver operating characteristic (ROC) curves for 4 
models built using gene expression data obtained at month 0 from 
juvenile idiopathic arthritis (JIA) patients and healthy controls. Data 
obtained at month 0 from the entire group of JIA patients, whether 
positive or negative for rheumatoid factor (RF), were used to predict 
outcome at month 6 (A) and at month 12 (B). Data obtained at month 
0 from only the RF- JIA patients were used to predict outcome at 
month 6 (C) and month 12 (D). The ROC curves show the relationship 
between changes in the true-positive rate and the false-positive rate as 
the classification threshold is varied. The better the performance of the 
classifier the greater the area under the curve. An entirely random 
classifier would generate a line at y = x. A perfect classifier is typified 
by the ROC curve shown in C. 



dictor of outcome at month 6 based on month 0 data for 
both the RF+/RF- group and the RF- only group. For 
the prediction of month 12 outcome based on month 0 
data in RF+/RF— patients, gene expression provided a 
more accurate predictor than did MTX plus etanercept 
therapy. However, for the analysis of only the RF- 
patients, the accuracies were similar, but using MTX 
plus etanercept as the predictor resulted in a much lower 
false-positive rate. As with gene expression, prediction 
based on treatment at month 4 was no better than 
chance. 

DISCUSSION 

The completion of the Human Genome Project 
was heralded as the beginning of a new era of "person- 
alized medicine." Technological spin-offs from the proj- 
ect, including gene expression profiling, have further 



added to the promise that individualized therapies can 
be developed based on genomic data (26). In this study, 
we used whole blood gene expression profiles to deter- 
mine whether we could predict therapeutic response in 
children enrolled in the TREAT in JIA study. We used 
an independent cohort of patients from Oklahoma to 
validate the statistical methods used to analyze the 
TREAT in JIA study samples. 

We found that, while it might be feasible to 
develop such predictive assays, the number of samples 
available from the TREAT in JIA study and the multiple 
different phenotypes (e.g., RF+ and RF-, as well as the 
2 different arms of the protocol with crossover to the 
"aggressive" side of the protocol for treatment failures 
at 4 months and at 6 months) made it difficult to develop 
statistically robust predictive models. Nevertheless, 
within these constraints, we were able to predict, based 
on expression profiling alone, the achievement of CID at 
6 months in the RF— patients. At 12 months, however, 
initial therapy with etanercept was a better predictor of 
disease response than was gene expression in the RF- 
patients (Table 1). While both treatment arms of the 
TREAT in JIA protocol demonstrated therapeutic effi- 
cacy (12), the use of etanercept as initial therapy, at least 
in the samples available to us for analysis, exerted the 
strongest influence on outcome at 12 months. However, 
the results from the support vector machine indicate that 
there is also a genomic component to prognosis. 

Our findings do not preclude the possibility of 



Table 1. Accuracy rates and FPRs for the prediction of clinical 
outcome at month 6 and month 12, based on either the gene 
expression model or the combined therapy with MTX plus etanercept 
initiated at month 0 or month 4* 



Patient group, 



Gene expression model 



MTX plus 
etanercept 
therapy 



initiation —> outcome 


Accuracy 


FPR 


AUC 


Accuracy 


FPR 


RF+/RF- (all patients) 












Month 0 — » month 6 


0.79 


0.12 


0.90 


0.62 


0.37 


Month 4 — » month 6 


0.56 


0.30 


0.48 


0.36 


0.82 


Month 0 -> month 12 


0.64 


0.80 


0.60 


0.60 


0.36 


Month 4 -> month 12 


0.44 


0.71 


0.44 


0.47 


0.82 


RF- patients 












Month 0 — » month 6 


1.00 


0 


1.00 


0.66 


0.36 


Month 4 — » month 6 


0.60 


0.17 


0.50 


0.34 


0.92 


Month 0 -> month 12 


0.70 


0.33 


0.76 


0.68 


0.28 


Month 4 -> month 12 


0.50 


1.00 


0.48 


0.51 


0.89 



* Predictions were made for the entire cohort of 62 patients, including 
those positive and those negative for rheumatoid factor (RF), as well 
as for the subset of 41 RF- patients. The area under the curve (AUC) 
provides a quantitative measure of the performance of the model. 
FPRs = false-positive rates; MTX = methotrexate. 
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developing a longer-term predictive model of sustain- 
ability of therapeutic response based on gene expression 
patterns if a larger patient cohort, one with more 
samples from patients in each arm of the protocol, were 
available. Particularly helpful in this regard was the 
analysis of the baseline samples. While there are some 
distinct phenotype differences between RF+ and RF— 
children with JIA, the expression profiles between RF+ 
and RF- children with JIA were remarkably similar, 
with overlap between the groups on hierarchical cluster 
analysis (Figure 3). These findings seem to support those 
of earlier studies showing that RF expression in children 
is more ubiquitous than is typically considered and is 
dependent more on the assay used to detect RFs than on 
their actual prevalence in the JIA population (27,28). 
Under any circumstances, although the TREAT in JIA 
study may have experienced some recruitment bias in 
favor of RF+ patients (35% of the TREAT in JIA study 
subjects were RF+), it appears to be feasible to group 
RF+ with RF- patients in future attempts to develop 
expression-based predictive models or assays. Further- 
more, it was interesting to note the degree of homoge- 
neity among and between the patient groups at the gene 
expression level (Figure 2). These findings suggest that a 
broad range of interpatient variability will not be a 
serious impediment to developing expression-based pre- 
dictive assays in the future. 

It should also be noted that both arms of the 
TREAT in JIA protocol were more aggressive than 
protocols that have previously been used in the routine 
clinical setting. The MTX dosage of 0.5 mg/kg/week 
(with a maximum dosage of 40 mg/week) is higher than 
the more standard dosages of 10-20 mg/m 2 orally with a 
weekly maximum of 25 mg. Whether the models would 
be more predictive in the setting of current clinical 
practice is unknown. Furthermore, the findings of the 
TREAT in JIA study, and particularly the degree to 
which the higher doses of MTX were tolerated in 
children, may provide an impetus to change clinical 
practice and, thus, obviate the need to test predictive 
models using lower oral doses of MTX. 

As is commonly seen in gene expression profiling 
in rheumatic diseases, a broad spectrum of functional 
associations were seen among the differentially ex- 
pressed genes, including genes associated with inflam- 
mation (NFKB, MAPK), cancer (BCL2), and adaptive 
immunity. The expression profiles do suggest complex 
interactions between innate and adaptive immunity that 
are not subsumed under any single prevalent theory 
concerning the pathogenesis of JIA. 

In conclusion, we have provided evidence that it 



is feasible to develop models of disease pathogenesis 
based on patterns of gene expression for the purpose of 
predicting outcome at 6 months. However, in the subset 
of samples from the TREAT in JIA study with which we 
worked, the use of etanercept was as good as or better 
than gene expression as a predictor of a patient's 
achieving CID by 12 months. Future translational stud- 
ies will likely require larger numbers of patients in order 
to develop clinically usable predictive assays. 
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